reinforcement learning

Terms from Artificial Intelligence: humans at the heart of algorithms

Page numbers are for draft copy at present; they will be replaced with correct numbers when final book is formatted. Chapter numbers are correct and will not change now.

Reinforcement learning is used to learn complex time and situation behaviours, when there is eiether no training data or it is insufficient. Often the poitn ar which rewards occur are well after the action that caused them and may be the result of several past actions leading toe credit assignment problems, which make positive or negative reinforcement hard. Learning takes place due to interactions with a (real ir simukated) world and therfore have a cost both directly due to the action being perfroemd (energy expenditure ofr a robot, network costs for a web agent) and indirecly due to the posutuve or negative effects of the action. However, without taking actions there is no potential for learning, this leads to an exploration-exploitation trade-off.

Defined on pages 376, 376

Used on Chap. 6: page 123; Chap. 16: pages 369, 376, 377, 378, 379, 388, 389; Chap. 22: page 546

Also known as reinforcement function, reinforcement learner